auditing fairness
Auditing Fairness by Betting
We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. Whereas previous work relies on a fixed-sample size, our methods are sequential and allow for the continuous monitoring of incoming data, making them highly amenable to tracking the fairness of real-world systems. We also allow the data to be collected by a probabilistic policy as opposed to sampled uniformly from the population. This enables auditing to be conducted on data gathered for another purpose. Moreover, this policy may change over time and different policies may be used on different subpopulations. Finally, our methods can handle distribution shift resulting from either changes to the model or changes in the underlying population. Our approach is based on recent progress in anytime-valid inference and game-theoretic statistics---the ``testing by betting'' framework in particular. These connections ensure that our methods are interpretable, fast, and easy to implement. We demonstrate the efficacy of our approach on three benchmark fairness datasets.
Auditing Fairness by Betting
We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. Whereas previous work relies on a fixed-sample size, our methods are sequential and allow for the continuous monitoring of incoming data, making them highly amenable to tracking the fairness of real-world systems. We also allow the data to be collected by a probabilistic policy as opposed to sampled uniformly from the population. This enables auditing to be conducted on data gathered for another purpose. Moreover, this policy may change over time and different policies may be used on different subpopulations. Finally, our methods can handle distribution shift resulting from either changes to the model or changes in the underlying population.
Auditing Fairness under Unobserved Confounding
Byun, Yewon, Sam, Dylan, Oberst, Michael, Lipton, Zachary C., Wilder, Bryan
A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any allocation, to derive unbiased estimates of risk. We demonstrate the effectiveness of our framework on a real-world study of Paxlovid allocation to COVID-19 patients, finding that observed racial inequity cannot be explained by unobserved confounders of the same strength as important observed covariates.
Under the Radar -- Auditing Fairness in ML for Humanitarian Mapping
Kondmann, Lukas, Zhu, Xiao Xiang
Humanitarian mapping from space with machine learning helps policy-makers to timely and accurately identify people in need. However, recent concerns around fairness and transparency of algorithmic decision-making are a significant obstacle for applying these methods in practice. In this paper, we study if humanitarian mapping approaches from space are prone to bias in their predictions. We map village-level poverty and electricity rates in India based on nighttime lights (NTLs) with linear regression and random forest and analyze if the predictions systematically show prejudice against scheduled caste or tribe communities. To achieve this, we design a causal approach to measure counterfactual fairness based on propensity score matching. This allows to compare villages within a community of interest to synthetic counterfactuals. Our findings indicate that poverty is systematically overestimated and electricity systematically underestimated for scheduled tribes in comparison to a synthetic counterfactual group of villages. The effects have the opposite direction for scheduled castes where poverty is underestimated and electrification overestimated. These results are a warning sign for a variety of applications in humanitarian mapping where fairness issues would compromise policy goals.